topr comes with three example GWAS datasets, one on Ulcerative Colitis retrieved from the UKBB (UC_UKBB), and the other two on Crohn’s disease (CD_UKBB and CD_FINNGEN) obtained from the FinnGen and UK biobanks respectively. topr utilizes gene and exon datasets from Ensembl (GRCh38.pxx) (ENSGENES and ENSEXONS).
See topr reference for more details on the in-built datasets.
Input datasets must include least three columns (CHROM, POS and P), where naming of the columns is flexible (i.e the chr label can be either chr or chrom and is case insensitive).
topr has 3 in-built datasets (GWASes), take a look at Crohn’s GWAS (CD_UKBB) by issuing the following command:
CD_UKBB %>%
head(n = 6)
CHROM column can be represented with or without the chr suffix, e.g (chr1 or 1)
topr’s key plotting functions are manhattan() and regionplot()
Get an overview of the Crohns association results on a Manhattan plot:
manhattan(CD_UKBB)
Label association peaks with the nearest gene and add plot title:
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease")
Add genes of interest to the plot
genes = c("IL23R", "NOTCH4", "NOD2", "JAK2", "TTC33")
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes)
Change the position of the genes of interest on the plot (the default value is 1):
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5)
Remove the significance threshold:
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5, sign_thresh = NULL)
Set multiple significance thresholds in different colors:
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5, sign_thresh = c(1e-08, 1e-09, 1e-10), sign_thresh_color = c("red",
"darkorange", "magenta"))
Change the scale and plot colors:
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5, highlight_genes_color = "magenta", color = "darkgreen",
scale = 1.2)
Change the color and angle of the labels:
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5, highlight_genes_color = "magenta", color = "darkgreen",
label_color = "grey40", angle = 90)
Tidy the labels, by moving them using nudge_y and increase the y-axis. Increase the region_size for sparser gene labelling.
manhattan(CD_UKBB, annotate = 5e-09, title = "Crohn's disease", highlight_genes = genes,
highlight_genes_ypos = 1.5, highlight_genes_color = "magenta", color = "darkgreen",
label_color = "grey40", angle = 90, nudge_y = 2, ymax = 30, region_size = 1e+07)
Display multiple GWASes on the same plot:
manhattan(list(UC_UKBB, CD_UKBB))
[1] "Use the legend_labels argument to change the legend labels from color names to meaningful labels! "
Set the legend labels,annotate and change plot color:
manhattan(list(UC_UKBB, CD_UKBB), legend_labels = c("UC UKBB", "CD UKBB"), annotate = 1e-12,
region_size = 1e+07, color = c("darkblue", "skyblue2"))
Use the ntop argument to control how many GWASes are shown on the top and how many at the bottom and highlight the genes of interest:
manhattan(list(UC_UKBB, CD_UKBB), legend_labels = c("UC UKBB", "CD UKBB"), annotate = 1e-12,
region_size = 1e+07, color = c("darkblue", "skyblue2"), ntop = 1, highlight_genes = genes,
highlight_genes_ypos = -0.5, angle = 90, ymax = 40, ymin = -30, nudge_y = 2,
label_color = "black", scale = 1.1)
Show 3 GWASes on the same plot, using different annotation thresholds for each dataset
manhattan(list(UC_UKBB, CD_UKBB, CD_FINNGEN), legend_labels = c("UC UKBB", "CD UKBB",
"CD FINNGEN"), annotate = c(5e-09, 5e-09, 1e-15), region_size = 1e+08, ntop = 1,
highlight_genes = genes, highlight_genes_ypos = -0.5, angle = 90, ymax = 40,
ymin = -30, nudge_y = 2, title = "Inflammatory Bowel Disease")
Zoom-in on the region around the IL23R gene
regionplot(CD_UKBB, gene = "IL23R")
[1] "Zoomed to region: chr1:67038906-67359979"
Annotate the top variant within each 100kb window.
regionplot(CD_UKBB, gene = "IL23R", annotate = 5e-09, region_size = 1e+05)
[1] "Zoomed to region: chr1:67038906-67359979"
Annotate the top variants and draw a vertical line through their positions to further highlight their position within the genes below.
regionplot(CD_UKBB, gene = "IL23R", annotate_with_vline = 5e-09, region_size = 1e+05)
[1] "Zoomed to region: chr1:67038906-67359979"
Zoom in on the IL23R gene for multiple GWASes
regionplot(list(UC_UKBB, CD_UKBB, CD_FINNGEN), gene = "IL23R", annotate_with_vline = 5e-06,
legend_labels = c("UC UKBB", "CD UKBB", "CD FINNGEN"))
[1] "Zoomed to region: chr1:67038906-67359979"
Locuszoom-like plot. Note that the input dataframe needs to include the R2 column with the pre-calculated r2 values, since topr does not do these calculations.
locuszoom(R2_CD_UKBB)
[1] "Zoomed to region: 1:67043049-67303423"
Annotate the variants with vlines on the plot:
locuszoom(R2_CD_UKBB, annotate_with_vline = 1e-09, region_size = 1e+05)
[1] "Zoomed to region: 1:67043049-67303423"
Extract lead/index variants from the GWAS dataset (CD_UKBB):
CD_UKBB %>%
get_best_snp_per_MB()
Annotate the lead/index variants with their nearest gene:
CD_UKBB %>%
get_best_snp_per_MB() %>%
annotate_with_nearest_gene()
Get genomic coordinates for a gene:
get_gene(gene_name = "IL23R")
Get snps within a region:
CD_UKBB %>%
get_snps_within_region(region = "chr1:67138906-67259979") %>%
head(n = 10)
Get the top variant on a chromsome:
CD_UKBB %>%
get_top_snp(chr = "chr1")